Tests of Bayesian Model Selection Techniques for Gravitational Wave Astronomy 
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The analysis of gravitational wave data involves many model selection problems. The most 
important example is the detection problem of selecting between the data being consistent with 
instrument noise alone, or instrument noise and a gravitational wave signal. The analysis of data 
from ground based gravitational wave detectors is mostly conducted using classical statistics, and 
methods such as the Neyman-Pearson criteria are used for model selection. Future space based 
detectors, such as the Laser Interferometer Space Antenna (LISA), are expected to produced rich 
data streams containing the signals from many millions of sources. Determining the number of 
sources that are resolvable, and the most appropriate description of each source poses a challenging 
model selection problem that may best be addressed in a Bayesian framework. An important class of 
LISA sources are the millions of low-mass binary systems within our own galaxy, tens of thousands 
of which will be detectable. Not only are the number of sources unknown, but so are the number 
of parameters required to model the waveforms. For example, a significant subset of the resolvable 
galactic binaries will exhibit orbital frequency evolution, while a smaller number will have measurable 
eccentricity. In the Bayesian approach to model selection one needs to compute the Bayes factor 
between competing models. Here we explore various methods for computing Bayes factors in the 
context of determining which galactic binaries have measurable frequency evolution. The methods 
explored include a Reverse Jump Markov Chain Monte Carlo (RJMCMC) algorithm, Savage-Dickie 
density ratios, the Schwarz-Bayes Information Criterion (BIC), and the Laplace approximation to 
the model evidence. We find good agreement between all of the approaches. 



I. BACKGROUND 

Bayesian statistical techniques are becoming increas- 
ingly popular in gravitational wave data analysis, and 
have shown great promise in tackling the various difficul- 
ties of gravitational wave (GW) source extraction from 
modeled data for the Laser Interferometer Space Antenna 
(LISA) . A powerful tool in the suite of Bayesian methods 
is that of quantitative model selection [HQ. To under- 
stand why this is a valuable feature consider a scenario 
where one is attempting to fit data with two competing 
models of differing dimension. In general, a higher di- 
mensional model will produce a better fit to a given set 
of data. This can be taken to the limit where there are as 
many model parameters as there are data points allowing 
one to perfectly match the data. The problem then is to 
decide how many parameters are physically meaningful 
and to select the model containing only those parameters. 
In the context of GW detection these extra parameters 
could be additional physical parameters used to model 
the source or additional sources in the data. If a model 
is over-parameterized it will over-fit the data and produce 
spurious results. 

Many of the model selection problems associated with 
LISA astronomy involve nested models, where the sim- 
pler model forms a subset of the more complicated model. 
The problem of determining the number of resolvable 
galactic binaries, and the problem of determining the 
number of measurable source parameters, are both ex- 
amples of nested model selection. One could argue that 
the later is better described as "approximation selection" 
since we are selecting between different paramctcrizations 
of the full 17 dimensional physical model that describes 
the signals from binary systems of point masses in general 



relativity. However, many similar modeling problems in 
astrophysics and cosmology |2j, as well as in other fields 
such as geophysics [3|, are considered to be examples of 
model selection, and we will adopt that viewpoint here. 

The LISA observatory Q is designed to explore the 
low frequency portion of the gravitational wave spectrum 
between ~ 0.1 — > 100 mHz. This frequency region will be 
heavily populated by signals from galactic binary systems 
composed of stellar mass compact objects (e.g. white 
dwarfs and neutron stars), of which millions arc theorized 
to exist. Tens of thousands of these GW sources will 
be resolvable by LISA and the remaining sources will 
contribute to a confusion- limited background This 
is expected to be the dominant source of low frequency 
noise for LISA. 

Detection and subsequent regression of the galactic 
foreground is of vital importance in order to then pur- 
sue dimmer sources that would otherwise be buried by 
the foreground. Because of the great number of galac- 
tic sources, and the ensuing overlap between individ- 
ual sources, a one-by-one detection/regression is inaccu- 
rate [6| . Therefore a global fit to all of the galactic sources 
is required. Because of the uncertainty in the number of 
resolvable sources one can not fix the model dimension a 
priori which presents a crucial model selection problem. 
Over-fitting the data will result in an inaccurate regres- 
sion which would then remove power from other sources 
in the data-stream, negatively impacting their detection 
and characterization. The Reverse Jump Markov Chain 
Monte Carlo approach to Bayesian model selection has 
been used to determine the number of resolvable sources 
in the context of a toy problem 7, 8] which shares some 
of the features of the LISA foreground removal prob- 
lem. Meanwhile the Laplace approximation to Bayesian 
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model selection has been employed to estimate the num- 
ber of resolvable sources as part of a MCMC based al- 
gorithm to extracting signals from simulated LISA data 
streams [(| Q . 

Another important problem for GW astronomy is the 
determination of which parameters need to be included in 
the description of the waveforms. For example, the GW 
signal from a binary inspiral, as detected by LISA, may 
involve as many as 17 parameters. However, for massive 
black hole binaries of comparable mass we expect the ec- 
centricity to be negligible, reducing the model dimension 
to 15, while for extreme mass ratio systems we expect 
the spin of the smaller body to have little impact on 
the waveforms, reducing the model dimension to 14. In 
many cases the inspiral signals may be described by even 
fewer parameters. For low mass galactic binaries spin 
effects will be negligible (removing six parameters), and 
various astrophysical processes will have circularized the 
orbits of the majority of systems (removing two param- 
eters). Of the remaining nine parameters, two describe 
the frequency evolution - e.g. the first and second time 
derivatives of the GW frequency, which may or may not 
be detectable (27j j . 

Here we investigate the application of Bayesian model 
selection to LISA data analysis in the context of deter- 
mining the conditions under which the first time deriva- 
tive of the GW frequency, /, can be inferred from the 
data. We parameterize the signals using the eight pa- 
rameters 



(1) 



where A is the amplitude, / is the initial gravitational 
wave frequency, 9 and ef> are the ecliptic co-latitude and 
longitude, ip is the polarization angle, l is the orbital 
inclination of the binary and ipo is the GW phase. The 
parameters /, / and tpo are evaluated at some fiducial 
time (e.g. at the time the first data is taken). For our 
analysis only a single source is injected into the simulated 
data streams. In the frequency domain the output s(f) 
in channel a can be written as 



s a (f) = h a (X) + n a (f) 



(2) 



where h a (A) is the response of the detector to the incident 
GW and h a (f) is the instrument noise. For our work we 
will assume stationary Gaussian instrument noise with no 
contribution from a confusion background. In our anal- 
ysis we use the noise orthogonal A, E, T data streams, 
which can be constructed from linear combinations of 
the Michelson type X, Y, Z signals: 
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T = - (X + Y + Z). 



(3) 



This set of A, E, T variables differ slightly from those 
constructed from the Sagnac signals [lj|- We do not use 



the T channel in our analysis as it is insensitive to GWs 
at the frequencies we are considering. The noise spectral 
density in the A, E channels has the form 

Sn(f) = | (1 - 008(2///,)) ((2 + cos(///*))S s 

+2(3 + 2 cos(///*) + cos(2///*))S„) Hz" 1 (4) 

where /* = 1/2ttL, and the acceleration noise S a and 
shot noise S s are simulated at the levels 
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Here L is the LISA arm length (» 5 x 10 9 m). 

Of central importance to Bayesian analysis is the pos- 
terior distribution function (PDF) of the model parame- 
ters. The PDF p(X\s) describes the probability that the 
source is described by parameters A given the signal s. 
According to Bayes' Theorem, 



p(X\s) 



p(X)p(s\X) 
J dX p(X) P (s\X) 



(6) 



where p(X) is the a priori, or prior, distribution of the 
parameters A andp(s|A) is the likelihood that we measure 
the signal s if the source is described by the parameters A. 
We define the likelihood using the noise weighted inner 
product 
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where the normalization constant C depends on the 
noise, but not the GW signal. One goal of the data anal- 
ysis method is to find the parameters A which maximizes 
the posterior. Markov Chain Monte Carlo (MCMC) 
methods are ideal for this type of application [11]. The 
MCMC algorithm will simultaneously find the param- 
eters which maximize the posterior and accurately map 
out the PDF of the parameters. This is achieved through 
the use of a Metropolis-Hastings [H, EH exploration of 
the parameter space. A brief description of this process 
is as follows: The chain begins at some random position 
x in the parameter space and subsequent steps are made 
by randomly proposing a new position in the parame- 
ter space y. This new position is determined by drawing 
from some proposal distribution q(x\y). The choice of 
whether or not adopt the new position y is made by cal- 
culating the Hastings ratio (transition probability) 



min |l 



p(y)p(s\y)q(y\x) ■ 

p(x)p(s\x)q(x\y) . 



(9) 
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and comparing a to a random number (3 taken from a 
uniform draw in the interval [0:1]. If a exceeds [3 then 
the chain adopts y as the new position. This process 
is repeated until some convergence criterion is satisfied. 
The MCMC differs from a Metropolis extremization by 
forbidding proposal distributions that depend on the past 
history of the chain. This ensures that the progress of the 
chain is Markovian and therefore statistically unbiased. 
Once the chain has stabilized in the neighborhood of the 
best fit parameters all previous steps of the chain are 
excluded from the statistical analysis (these early steps 
are referred to as the "burn in" phase of the chain) and 
henceforth the number of iterations the chain spends at 
different parameter values can be used to infer the PDF. 

The power of the MCMC is two-fold: Because the al- 
gorithm has a finite probability of moving away from a 
favorable location in the parameter space it can avoid 
getting trapped by local features of the likelihood sur- 
face. Meanwhile, the absence of any "memory" within 
the chain of past parameter values allows the algorithm 
to blindly statistically explore the region in the neigh- 
borhood of the global maximum. It is then rigorously 
proven that an MCMC will (eventually) perfectly map 
out the PDF, completely removing the need for user in- 
put to determine parameter uncertainties or thresholds. 

The parameter vector that maximizes the posterior dis- 
tribution is stored as the maximum a posteriori (MAP) 
value and is considered to be the best estimate of the 
source parameters. Note that because of the prior weight- 
ing in the definition of the PDF this is not necessarily 
the A that yields the greatest likelihood. Upon obtaining 
the MAP value for a particular model X the PDF, now 
written as p{X, X\s), can be employed to solve the model 
selection problem. 



II. BAYES FACTOR ESTIMATES 

The Bayes Factor Bxy [3 is a comparison of the ev- 
idence for two competing models, X and Y, where 



Bxy 


2 log Bxy 


Evidence for model X 


< 1 


< 


Negative (supports model Y) 


1 to 3 


to 2 


Not worth more than a bare mention 


3 to 12 


2 to 5 


Positive 


12 to 150 


5 to 10 


Strong 


> 150 


> 10 


Very Strong 



TABLE I: Bxy 'confidence' levels taken from [lj] 



to be inefficient. To employ this powerful statistical tool 
various estimates for the Bayes Factor have been devel- 
oped that have different levels of accuracy and computa- 
tional cost [H, 0| ■ We have chosen to focus on four such 
methods: Reverse Jump Markov Chain Monte Carlo and 
Savage-Dickie density ratios, which directly estimate the 
Bayes factor, and the Schwarz-Bayes Information Cri- 
terion (BIC) and Laplace approximations of the model 
evidence. 



A. RJMCMC 

Reverse Jump Markov Chain Monte Carlo (RJMCMC) 
algorithms are a class of MCMC algorithms which admit 
"trans-dimensional" moves between models of different 
dimension 0, Qjl, [l6| . For the trans-dimensional imple- 
mentation applicable to the LISA data analysis problem 
the choice of model parameters becomes one of the search 
parameters. The algorithm proposes parameter 'birth' or 
'death' moves (proposing to include or discard the 'extra' 
parameter(s)) while holding all other parameters fixed. 
The priors in the RJMCMC Hastings ratio 



a = mm 



p{\ Y )p{s\\Y)g(uY) i 
p(Xx)p(s\X x )g[ux) 



(12) 



px(s) = / d\p(X,X\s) 



(10) 



automatically penalizes the posterior density of the 
higher dimensional model, which compensate for its 
generically higher likelihood, serving as a built in 'Occam 
Factor.' The g(u) which appears in (fT2"|) is the distribu- 
tion from which the random numbers u are drawn and 
I J I is the Jacobian 



is the marginal likelihood, or evidence, for model X. The 
Bayes Factor can then be calculated by 



B XY {s) 



Px(s) 
p Y (s) ' 



(11) 



The Bayes Factor has been described as the Holy Grail 
of model selection: It is a powerful entity that is very 
difficult to find. The quantity Bxy can be thought of as 
the odds ratio for a preference of model X over model Y. 
Apart from carefully concocted toy problems, direct cal- 
culation of the evidence, and therefore Bxy, is imprac- 
tical. To determine Bxy the integral required to com- 
pute px can not generally be solved analytically and for 
high dimension models Monte-Carlo integration proves 



d(X Y ,u Y ) 



d{X x ,ux) 



(13) 



The chain will tend to spend more iterations using the 
model most appropriately describing the data, making 
the decision of which model to favor a trivial one. To 
quantitatively determine the Bayes Factor one simply 
computes the ratio of the iterations spent within each 
model. 



B 



XY 



# of iterations in model X 

# of iterations in model Y 



(14) 



Like the simpler MCMC methods, the RJMCMC is guar- 
anteed to converge on the correct value of Bxy making 
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FIG. 1: 50 000 iteration segment of an RJMCMC chain mov- 
ing between models with and without frequency evolution. 
This particular run was for a source with q — 1 and SNR=10 
and yielded Bxy ~ 1. 

it the 'gold standard' of Bayes Factor estimation. And, 
like regular MCMCs, the convergence can be very slow, so 
that in practice the Bayes Factor estimates can be inaccu- 
rate. This is especially true when the trans-dimensional 
moves involve many parameters, such as the 7 or 8 di- 
mensional jumps that are required to transition between 
models with differing numbers of galactic binaries. 

Figure 1 shows the output of a RJMCMC search of a 
simulated LISA data stream containing the signal from a 
galactic binary with q = fT 2 hs = 1 and and observation 
time of T bs = 2 years. The chain moved freely between 
the 7-dimensional model with no frequency evolution and 
the 8-dimensional model which included the frequency 
evolution. 



The Fisher Information Matrix (FIM) T with compo- 
nents 

dh 

T i:j ee (h,i\h,j) where h,i= — (17) 
can be used as a quadratic approximation to H yielding 

P x(s)~p(X MAP ,X\s) { -j=L= (18) 

We will refer to this estimate of the evidence as the 
Laplace-Fisher (LF) approximation. The LF approxima- 
tion breaks down if the priors have large gradients in the 
vicinity of the MAP parameter estimates. The FIM esti- 
mates can also be poor if some of the source parameters 
are highly correlated, or if the quadratic approximation 
fails. In addition, the FIM approximation gets progres- 
sively worse as the SNR of the source decreases. 

A more accurate (though more costly) method for esti- 
mating the evidence is the Laplace-Metropolis (LM) ap- 
proximation which employs the PDF as mapped out by 
the MCMC exploration of the likelihood surface to es- 
timate H 16]. This can be accomplished by fitting a 
minimum volume ellipsoid (MVE) to the D-dimensional 
posterior distribution function. The principle axes of the 
MVE lie in eigen-directions of the distribution which gen- 
erally do not lie along the parameter directions. Here we 
employ the MVE . j ar package which utilizes a genetic al- 
gorithm to determine the MVE of the distribution and 
returns the covariance matrix of the PDF [T3|. The de- 
terminant of the covariance matrix can then be used to 
determine the evidence via 

p x (s) ~ p(X M ap, V| S )(2vr) I? /VdetC. (19) 

In the MCMC literature the LM approximation is gen- 
erally considered to be second only to the RJMCMC 
method for estimating Bayes Factors. 



B. Laplace Approximation 



C. Savage Dickie Density Ratio 



A common approach to model selection is to approx- 
imate the model evidence directly. Working under the 
assumption that the PDF is Gaussian, the integral in 
equation (|10[) can be estimated by use of the Laplace 
approximation. This is accomplished by comparing the 
volume of the models parameter space V to that of the 
parameter uncertainty ellipsoid AV 



p x (s) ~ p(X 



MAP 



X\s 



>( 



AV: 



X 



Vx 



(15) 



The uncertainty ellipsoid can be determined by calculat- 
ing the determinant of the Hessian TL of partial deriva- 
tives of the posterior with respect to the model parameter 
evaluated at the MAP value for the parameters. 



Px(s) ~p(A 



MAP 



X\s 



(2tt 



,D/2 



VdetH 



(16) 



Both RJMCMC and LM require exploration of the pos- 
terior for each model under consideration. The Savage- 
Dickie (SD) approximation estimates the Bayes Factor 
directly while only requiring exploration of the highest di- 
mensional space |2ill8|. This approximation requires that 
two conditions are met: Model X must be nested within 
Model Y (adding and subtracting parameters clearly sat- 
isfies this condition) and the priors for any given model 
must be separable (i.e. p(X) = p(X 1 )xp(X 2 ) x . . . xp(X D )) 
which is, to a good approximation, satisfied in our exam- 
ple. The Bayes Factor Bxy is then calculated by com- 
paring the weight of the marginalized posterior to the 
weight of the prior distribution for the 'extra' parameter 
at the default, lower-dimensional, value for the parameter 
in question. 



Bxy{s) 



p{Xq\s) 
p(Ao) 



(20) 
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It is interesting to note that if the above conditions are 
precisely satisfied it can then be shown that this is an 
exact calculation of Bxy (assuming sufficient sampling 
of the PDF), as opposed to an approximation. 



TABLE II: Source parameters 



/ (mHz) 


cos 9 


cj> (deg) 


H> (deg) 


COS I 


V?o (deg) 


q 


Tobs (yr) 


5.0 


1.0 


266.0 


51.25 


0.17 


204.94 


l 
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D. Schwarz-Bayes Information Criterion 

All of the approximations discussed so far depend on 
the supplied priors p(A). The Schwarz-Bayes Informa- 
tion Criterion (BIC) method is an approximation to the 
model evidence which makes its own assumptions about 
the priors - namely that they take the form of a multi- 
variate Gaussian with covariance matrix derived from the 
Hessian Ti [l6|, [l9j . The BIC estimate for the evidence is 
then 



D 



lnpx(s) ~ lnp(A M AP, X\s) ^-lnN eS 



(21) 



where Dx is the dimension of model X and N e g is the 
effective number of samples in the data. For our tests we 
defined N e g to be the number of data points required to 
return a (power) signal-to-noise ratio of SNR 2 — D, where 
SNR is the signal-to-noise one gets by summing over the 
entire LISA band. This choice was motivated by the 
fact that the variance in SNR 2 is equal to D 2 , so N e g 
accounts for the data points that carry significant weight 
in the model fit. The BIC estimate has the advantage of 
being very easy to calculate, but is generally considered 
less reliable than the other techniques we are using. 



III. CASE STUDY 

To compare the various approximations to the Bayes 
Factor we simulated a 'typical' galactic binary. The in- 
jected parameters for our test source can be found in ta- 
bleJTTl Since / ~ f 11 ^ 3 , higher frequency sources are more 
likely to have a measurable /. On the other hand, the 
number of binaries per frequency bin falls of as ~ / _11 ^ 3 , 
so high frequency systems are fairly rare. As a compro- 
mise, we selected a system with a GW frequency of 5 
mHz. To describe the frequency evolution we introduced 
the dimensionless parameter 



/^obs> 



(22) 



which measures the change in the Barycenter GW fre- 
quency in units of 1/T b s frequency bins. For q 1 it 
is reasonable to believe that a search algorithm will have 
no difficulty detecting the frequency shift. Likewise, for 
q <C 1 it is unlikely that the frequency evolution can be 
detected (at least for sources with modest SNR). There- 
fore we have selected q ~ 1 to test the model selection 
techniques. Achieving q = 1 for typical galactic binaries 
at 5 mHz requires observation times of approximately 
two years. A range of SNRs were explored by varying 
the distance to the source. 



We can rapidly calculate accurate waveform templates 
using the fast-slow decomposition described in the Ap- 
pendix. Our waveform algorithm has been used in the 
second round of Mock LISA Data Challenges to sim- 
ulate the response to a galaxy containing some 26 million 
sources. The simulation takes just a few hours to run on 
a single 2 GHz processor. 

We simulated a 1024 frequency bin snippet of LISA 
data around 5 mHz that included the injected signal 
and stationary Gaussian instrument noise. The Markov 
chains were initialized at the injected source parameters 
as the focus of this study is the statistical character of 
the detection, and not the initial detection (a highly ef- 
ficient solution to the detection problem is described in 
Ref. @). We used uniform priors for all of the parame- 
ters, with the angular parameters taking their standard 
ranges. We took the frequency to lie somewhere within 
the frequency snippet, and hi A to be uniform across 
the range \ ln(5„/(2T)) and \ ln(10005 n /(2T), which 
roughly corresponds to signal SNRs in the range 1 to 
1000. We took the frequency evolution parameter q to 
be uniformly distributed between -3 and 3 and adopted 
q = as the default value when operating under the 
7-dimensional model. In reality, astrophysical considera- 
tions yield a very strong prior for q (see Section [V]) that 
will significantly impact model selection. We decided to 
use a simple uniform prior to compare the various ap- 
proximations to the Bayes Factor, before moving on to 
consider the effects of the astrophysical prior in Section 

E3 

The choice of proposal distribution q(x\y) from which 
to draw new parameter values has no effect on the asymp- 
totic form of the recovered PDFs, but the choice is cru- 
cially important in determining the rate of convergence to 
the stationary distribution. We took q{x\y) to be a mul- 
tivariate Gaussian with covariance matrix given by the 
inverse of the FIM. In addition to the source parameters 
we included two additional parameters, k A and k E , that 
describe the noise levels in the A and E data channels: 



#(/) = k A S n (f) 
S*{f) = k E S n {f). 



(23) 



In a given realization of the instrument noise k A and 
k E will differ from unity by some random amount 5. The 
quantity S will have a Gaussian distribution with variance 
a 2 = 1 /N, where N is the number of frequency bins being 
analyzed. The likelihood p(s\X) can then be written as 



p( s \\) = C'exp 







[-*(-* 


s-h) 







(24) 



where the constant C is independent of the signal param- 
eters A and the noise parameters k A and k E - We explored 
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the noise level estimation capabilities of our MCMC al- 
gorithm by starting kj± and kg far from unity. As can be 
seen in Figure 2 the chain quickly identified the correct 
noise level. 




FIG. 2: Demonstration of the MCMC algorithm's rapid de- 
termination of the injected noise level. The parameters £u 
and ks were initialized at 10 and 0.1 respectively. 



IV. COMPARISON OF TECHNIQUES 

We compared the Bayes Factor estimates obtained us- 
ing the various methods in two ways. First, we fixed the 
frequency derivative of the source at q = 1 and varied 
the SNR between 5 and 20 in unit increments. Second, 
we fixed the signal to noise ratio at SNR = 12 and varied 
the frequency derivative of the source. 
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FIG. 3: Plot of the Bayes Factor estimates as a function of 
SNR for each of the approximation schemes described in the 
text. 

The results of the first test are shown in Figure 3 We 
see that all five methods agree very well for SNR > 7. 



As expected, the Laplace-Metropolis and Savage-Dickc 
methods provide the best approximation to the "Gold 
Standard" RJMCMC estimate, showing good agreement 
all the way down to SNR = 5. Most importantly, all five 
methods agree on when the 8-dimensional source model is 
favored over the 7-dimensional model, placing the tran- 
sition point at SNR ~ 12.2. To get a rough estimate 
for the numerical error in the various Bayes Factor es- 
timates we repeated the SNR= 15 case 10 times using 
different random number seeds. We found that the nu- 
merical error was enough to account for any quantitative 
differences between the estimates returned by the various 
approaches. 

It is interesting to compare the Bayesian model se- 
lection results to the frequentist "3-cr" rule for positive 
detection: 



\q\ > 3cr„ 



(25) 



where q is the MAP estimate for the frequency change 
and o q is the standard deviation in q as determined by 
the FIM. For the source under consideration we found 
the "3-cr" rule to require SNR ~ 13 for a detection, in 
good agreement with the Bayesian analysis. This lends 
support to Seto's [2lJ earlier FIM based study of the de- 
tectability of the frequency evolution of galactic bina- 
ries, but we should caution that the literature is replete 
with examples where the "3-cr" rule yields results in dis- 
agreement with Bayesian model selection and common 
sense 1221] . 




Favor 7D model 



FIG. 4: Plot of the Bayes Factor estimates as a function of q 
for each of the approximation schemes described in the text. 
The signal to noise ratio was held fixed at SNR = 12. 

The results of the second test are displayed in Figure 4 
In this case all five methods produced results that agree 
to within numerical error. 

While the results shown here are for a particular choice 
of source parameters, we found similar results for other 
sets of source parameters. In general all five methods 
for estimating the Bayes Factor gave consistent results 
for signals with SNR > 7. One exception to this general 
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trend were sources with inclinations close to zero, as then 
the PDFs tend to be highly non-gaussian. The Laplace- 
Metropolis and Laplace-Fisher approximations suffered 
the most in those cases. 



V. ASTROPHYSICAL PRIORS 

Astrophysical considerations lead to very strong pri- 
ors for the frequency evolution of galactic binaries. The 
detached systems, which are expected to account for 
the majority of LISA sources, will evolve under gravita- 
tional radiation reaction in accord with the leading order 
quadrapole formula: 



f 3(8.)^ 

1 40 1 ■ 



(26) 



where M. is the chirp mass. Contact binaries undergoing 
stable mass transfer from the lighter to the heavier com- 
ponent are driven to longer orbital periods by angular 
momentum conservation. The competition between the 
effects of mass transfer and gravitational wave emission 
lead to a formula for / with the same frequency and mass 
scaling as p6| . but with the opposite sign and a slightly 
lower magnitude [23| . 

Population synthesis models, calibrated against obser- 
vational data, yield predictions for the distribution of 
chirp masses M. as a function of orbital frequency. These 
distributions can be converted into priors on q. In con- 
structing such priors one should also fold in observational 
selection effects, which will favor systems with larger 
chirp mass (the GW amplitude scales as M 5 ^ 6 ). To get 
some sense of how such priors will affect the model selec- 
tion we took the chirp mass distribution for detached sys- 
tems at / ~ 5 mHz from the population synthesis model 
described in Ref. [2J|, (kindly provided to us by Gijs Nele- 
mans), and used (|26p to construct the prior on q shown 
in Figures 5 and 6 (observation selection effects were ig- 
nored). The prior has been modified slightly to give a 
small but no- vanishing weight to sources with q = 0. The 
astrophysically motivated prior has a very sharp peak at 
q = 0.64, and we use this value when fixing the frequency 
derivative for the 7-dimensional model. 

To explore the impact on model selection when such a 
strong prior has been adopted we simulated a source with 
q = 1 and varied the SNR. The RJMCMC algorithm was 
applied using chains of length 10 7 in conjunction with 
a fixed 8-dimensional MCMC (also allowed to run for 
10 7 iterations) in order to compare the RJMCMC results 
with the Savage-Dickie density ratio. 

The results of this first exploration are shown in Fig- 
ure 5. We found that for SNR < 15 the marginalized 
PDF very closely resembled the prior distribution. This 
demonstrates that the information content of the data 
is insufficient to change our prior belief about the value 
of the frequency derivative. As the SNR increased, how- 
ever, the PDF began to move away from the prior until 
we reached SNR=30 when the astrophysical prior had a 



0.03 f 
0.025 - 

0.02 - 
0.015 - 

0.01 
0.005 - 

o 

0.03 r 

0.025 - 
0.02 - 

0.015 - 
0.01 

0.005 - 



0.03 
0.025 

0.02 
0.015 

0.01 
0.005 

o i 

0.03 
0.025 

0.02 
0.015 

0.01 
0.005 




0.03 r 
0.025 - 

0.02 - 
0.015 - 

0.01 - 
0.005 - 

0.03 r 

0.025 - 

0.02 - 

0.015 - 
0.01 

0.005 - 



FIG. 5: Comparison between astrophysically motivated prior 
distribution of q for / = 5 mHz and T b s = 2 years (dashed, 
blue) to marginalized PDF (solid, red) for sources injected 
with q = 1 and SNRs varying from 5 to 30. 



SNR 


B XY (SD) 


Bxy (RJMCMC) 


5 


0.926 


1.015 


10 


0.977 


0.996 


15 


0.749 


0.742 


20 


0.427 


0.427 


25 


0.176 


0.177 


30 


0.060 


0.056 



TABLE III: Savage-Dickie density ratio estimates of Bxy for 
sources with q = 1 and SNRs varying from 5 to 30. Compar- 
isons with RJMCMC explorations of the same data set show 
excellent agreement between the two methods. 



negligible effect on the shape of the posterior, signaling 
confidence in the quoted measurement of q. This qualita- 
tive assessment of model preference is strongly supported 
by the Bayes factor estimation made by the RJMCMC 
algorithm as can be seen in Table III. It should also be 
noted that the excellent agreement between the RJM- 
CMC and S-D estimates for Bayes factor Bxy- Both 
methods indicate that for the chosen value of q = L 
the signal-to-noise needs to exceed SNR ~ 25 for the 
8-dimensional model to be favored. This is in contrast 
to the case discussed earlier where a uniform prior was 
adopted for the frequency derivative, and the model se- 
lection methods began showing a preference for the 8- 
dimensional model around SNR=12. 

Figure 6 shows the impact of the astrophysically moti- 
vated prior when the SNR was held at 15 and four differ- 
ent injected values for q were adopted, corresponding to 
the full width at half maximum (FWHM) and full width 
at quarter maximum (FWQM) of the prior distribution. 
The Bayes factors listed in Table IV indicate that for 
modestly loud sources with SNR=15 the model selection 
techniques do not favor updating our estimate of the fre- 
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FIG. 6: Marginalized PDF (solid, red) for fixed SNR=15 in- 
jected sources with q corresponding to FWHM and FWQM 
of the astrophysical prior (dashed, blue) 



q 


Bxy (SD) 


Bxy (RJMCMC) 


0.35 


1.412 


1.414 


0.48 


1.381 


1.388 


0.83 


1.059 


1.052 


1.15 


0.432 


0.428 



TABLE IV: Savage-Dickie and RJMCMC density ratio esti- 
mates of Bxy for sources with SNR=15 and q at FWHM and 
FWQM of astrophysical prior 



quency derivative until the frequency derivative exceeds 
q = 1.2. 

VI. DISCUSSION 



number of sources, as this requires jumps that span seven 
or more dimensions. 

The Laplace-Metropolis method for approximating the 
model evidence is more robust than the commonly used 
Fisher Information Matrix approximation of the Hessian 
of the PDF. Implementing an LM evidence estimation is 
a somewhat costly because of the need to fit the posterior 
to a minimum volume ellipsoid. 

The Savage-Dickie approximation is more economical 
than the RJMCMC or LM methods, but is limited by the 
requirement that the competing models must be nested. 

The Bayes Information Criterion approximation to the 
evidence is by far the cheapest to implement, and is able 
to produce reliable results when the SNR is high. It has 
therefore shown the most promise as an 'on the fly' model 
determination scheme. More thorough (and therefore 
more costly) methods such as RJMCMC and LM could 
then be used to refine the conclusions initially made by 
the BIC. 

Our investigation using a strong astrophysical prior 
indicated that the gravitational wave signals will need 
to have high signal-to-noise (SNR > 25), or moderate 
signal-to-noise (SNR > 15) and frequency derivatives far 
from the peak of the astrophysical distribution, in order 
to update our prior belief in the value of the frequency 
derivative. In other words, the frequency derivative will 
only been needed as a search parameter for a small num- 
ber of bright high frequency sources. 
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We have found that the several common methods for 
estimating Bayes Factors give good agreement when ap- 
plied to the the model selection problem of deciding when 
the data from the LISA observatory can be used to detect 
the orbital evolution of a galactic binary. The methods 
studied require varying degrees of effort to implement and 
calculate, and although found to be accurate in this test 
case, it is clear that some of these methods would be in- 
appropriate approximations for more physically relevant 
examples. 

If a RJMCMC algorithm is used as the sole model 
selection technique, the resistance of the algorithm 
to change dimension, especially when making multi- 
dimensional jumps, can result in invalid model selection 
unless the chains are run for a very large numbers of 
steps. In the examples we studied the transdimensional 
jumps only had to span one dimension, and our basic 
RJMCMC algorithm performed well. However, a more 
sophisticated implementation, using e.g. rejection sam- 
pling or coupled chains, will be required to select the 



Appendix A 

To leading order in the eccentricity, e, the Cartesian 
coordinates of the i th LISA spacecraft are given by [25[ 

Xi(t) — i? cos(a) + ^ei?^ cos(2a — /3i) — 3 cos(/3)^ 

yi (t) = i?sin(a) + iei?(sin(2a-/3 i )-3sin(/3)) 
Zi(t) = -v / 3ei?cos(a - ft) . (27) 

In the above R — 1 AU, is the radial distance of the 
guiding center, a = 2irf m t + k is the orbital phase of the 
guiding center, and ft = 2n(i — l)/3 + A (i — 1, 2, 3) is 
the relative phase of the spacecraft within the constel- 
lation. The parameters k and A give the initial ecliptic 
longitude and orientation of the constellation. The dis- 
tance between the spacecraft is L = Setting 
e = 0.00985 yields L = 5 x 10 9 m. 
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An arbitrary gravitational wave traveling in the k di- 
rection can be written as the linear sum of two indepen- 
dent polarization states 



h(0 = M£)£ + + M£)£ > 



(28) 



where the wave variable £ = t — k ■ x gives the surfaces 
of constant phase. The polarization tensors can be ex- 
panded in terms of the basis tensors e + and e x as 



e + = cos(2-0)e + — sin(2?/')e > 
e x = sin(2'0)e + + cos(2V')e > 

where ip is the polarization angle and 

e + = u (g) m — v ®v 
e x = u ® v + v (8 u . 



(29) 



(30) 



The vectors (u, v, k) form an orthonormal triad which 
may be expressed as a function of the source location on 
the celestial sphere 



u = cos f cos <p x + cos v sin cp y — sin t) z 
v = sin 4>x — cos 4> y 

k = — sin 9 cos <fi £ — sin 9 sin <fi y — cos 6 z . 
For mildly chirping binary sources we have 



h(0 = 5i[(A + £ + + e' l37r < /2 A x £ x 



«'*(0 



(31) 



(32) 



where 



-4> 



2.M(tt/) 2 / 3 
4A4(^/) 2 / 3 



1 



cos 2 t) 



COS t . 



(33) 



Here M. is the chirp mass, Dl is the luminosity dis- 
tance and l is the inclination of the binary to the line 
of sight. Higher post-Newtonian corrections, eccentric- 
ity of the orbit, and spin effects will introduce additional 
harmonics. For chirping sources the adiabatic approxi- 
mation requires that the frequency evolution / occurs on 
a timcscale long compared to the light travel time in the 
interferometer: /// <C L. The gravitational wave phase 
can be approximated as 



*(0 = 27rM + 7rM 2 



(34) 



where fo is the initial phase. The instantaneous fre- 
quency is given by 27r/ = d t ^: 



/ = (/o + M)(i-fc-v). 



(35) 



The general expression for the path length variation 
caused by a gravitational wave involves an integral in 
£ from £j to £/. Writing £ = + 5£ we have 



Thus, we can treat the wave as having fixed frequency 
fo + /o£i f° r the purposes of the integration, and then 
increment the frequency forward in time in the final ex- 
pression [261) . The path length variation is then given 



5£ij(0 = LR d(/,t,A):h(0 



(37) 



where a : b 
given by 



The one-arm detector tensor is 



and the transfer function is 
T{f, t, k) = sh 



(38) 



1 - k ■ hj (t) 



x exp 



/ 



1 - k ■ fij (t) 



(39) 



where /* = l/(2nL) is the transfer frequency and / = 
fo + /o£. The expression can be attacked in pieces. It is 
useful to define the quantities 



4(*) = :e+ 
d*(t) = (^(t)®f«(t)) :e x 

and y tj (t) = 5t ij {t)/(2L). Then 

!/«(*) =K[»^ w (*)e 2wiAt ] ) 

where 



(40) 
(41) 

(42) 



slow 

>'lj 



(*) 



((4(t)(A + (t)cos(2^) 



+e 3 "/ 2 ^l x (i) sin(2^)) 
+d x (t)(e 3 "/ 2 A x (t)cos(2V) 

sin(2^))) e^/o^+^o-aTri/ofe-x)^ ( 43 ) 



It is a simple exercise to derive explicit expressions for the 
antenna functions and the transfer function appearing in 
yf™{t) using (HZJ) and (EH)- 

In the Fourier domain the response can be written as 



»«(*) = H 



V n 



,2nint/T obB \ 2irif t 



(44) 



where the coefficients a n can be found by a numerical 
FFT of the slow terms yf] ow (i). Note that the sum over 
n should extend over both negative and positive values. 
The number of time samples needed in the FFT will de- 
pend on /o and fo and T b s , but is less than 2 9 = 512 
for any galactic sources we are likely to encounter when 
T bs < 2yr. The bandwidth of a source can be estimated 
as 



#(£)-27r(/o+/o&)<5£ + const. 



(36) 



B = 2 (4 + 2^/ o i?sin(0)) f m + f T ohs 



(45) 



10 



The number of samples should exceed 2BT ^ S . The 
Fourier transform of the fast term can be done analyti- 
cally: 



2mfat 



imt/T ob 



where 



and 



b m = Tobs sinc(a; m )e i 



x m = foTobs - rn . 



(46) 



(47) 



(48) 



The cardinal sine function in (|4"6"|) ensures that the 
Fourier components b m away from resonance, x m w 0, are 
quite small. It is only necessary to keep ~ 100 — > 1000 
terms either side of p = [/oTobs] > depending on how bright 
the source is, and how far foT \, s is from an integer. We 
now have 



where 



^ ^ Q"nbj — r 



(49) 



(50) 



The final step is to ensure that our Fourier transform 
yields a real 2/y(t)- This is done by setting the final an- 
swer for the Fourier coefficients equal to dj = (cj+c*_j)/2. 



But since x m never hits resonance for positive j (we 
are not interested in the negative frequency components 
j < 0), we can neglect the second term and simply write 
dj = Cj/2. 

Basically what we are doing is hetrodyning the signal 
to the base frequency /o, then Fourier transforming the 
slowly evolving hetrodyned signal numerically. We then 
convolve these Fourier coefficients with the analytically 
derived Fourier coefficients of the carrier wave. 

The Michelson type TDI variables are given by 



X(t) 


= yu(t - 3i) - yi 3 {t - 


3L) 


+ 2/2i(i - 


2L) 






-y 31 {t - 2L) + y 13 (t 


-L) 


- wit - 


-L) 






+2/31 W ~ 2/21 (t), 








(51) 


Y(t) 


= 2/2a(* - 3L) - 2/21 (t - 


ZL) 


+ 2/3 2 (*- 


2L) 






-yi 2 (t - 2L) + y 21 (t 


-L) 


-y 23 (t- 


-L) 






+2/12(0 - 2/32 (t), 








(52) 


Z(t) 


= 2/31 (t - 3L) ~ y 32 {t - 


3L) 


+ Vw(t - 


2L) 






~y 23 (t - 2L) + y 32 {t 


-L) 


- 2/31 it - 


-L) 






+2/23(0 - 2/13(0- 








(53) 



Note that in the Fourier domain 

Xif) = 2/i2(/)e- 3l/// * -2/i3(/)e- 3l/// * +2/2i(/)e- 2l/// * 
-2/3i(/)e- 24/// * + i 7i3(/)e- l/// * - yi 2 (/)e- l/// * 

+ynif) - miif) ■ (54) 

This saves us from having to interpolate in the time do- 
main. We just combine phase shifted versions of our orig- 
inal Fourier transforms. 
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